The search functionality is under construction.

Author Search Result

[Author] Michitaka KAMEYAMA(65hit)

21-40hit(65hit)

  • Highly Parallel Collision Detection Processor for Intelligent Robots

    Michitaka KAMEYAMA  Tadao AMADA  Tatsuo HIGUCHI  

     
    PAPER

      Vol:
    E75-C No:4
      Page(s):
    398-404

    In intelligent robots capable of autonomous work, the development of a high-performance special-purpose VLSI processor for collison detection will become very important for automatic motion planning. Conventionally, this kind of processing is performed by general-purpose processors. In this paper, a first collision detection VLSI processor is proposed to achieve ultrahigh-performance processing with an ideal parallel processing scheme. A large number of coordinate transformations and memory accesses to the obstacle memory are fully utilized in the processing algorithm, so that direct collision detection can be executed with a VLSI-oriented regular data flow. The structure of each processing element (PE) is very simple because a PE mainly consists of a COordinate Rotation DIgital Computer (CORDIC) arithmetic unit for the coordinate transformation and memories for the storage of manipulator and obstacle information. When 100 PE's are used to make parallel processing, the performance is about 10 000 times faster than that of conventional approaches using a single general-purpose microprocessor.

  • Design of a Matrix Multiply-Addition VLSI Processor for Robot Inverse Dynamics Computation

    Somchai KITTICHAIKOONKIT  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER-Dedicated Processors

      Vol:
    E74-C No:11
      Page(s):
    3819-3828

    This paper proposes the design of a matrix multiply-addition VLSI processor (MMP) for minimum-delaytime inverse dynamics computation based on linear array architecture. The MMP mainly consists of four multiply-adders, thus performing 44 matrix multiply-additions with a regular data flow. The delay time becomes minimum based on the concept of "odd-even alternative computation". VLSI-oriented architecture which supports high-speed computation of the odd-even alternative computation both in the MMP level and in the array level, is achieved through the use of two types of the data-dependence graphs. By layout evaluation, it is demonstrated that the MMP can be easily implemented in a single chip. A linear array of MMPs is capable of performing inverse dynamics computation of any manipulator with minimum-delay time. The estimated performance with regard to the delay time is the highest in the architectures reported until now.

  • FPGA Implementation of a Stereo Matching Processor Based on Window-Parallel-and-Pixel-Parallel Architecture

    Masanori HARIYAMA  Yasuhiro KOBAYASHI  Haruka SASAKI  Michitaka KAMEYAMA  

     
    PAPER-VLSI Architecture

      Vol:
    E88-A No:12
      Page(s):
    3516-3522

    This paper presents a processor architecture for high-speed and reliable stereo matching based on adaptive window-size control of SAD (Sum of Absolute Differences) computation. To reduce its computational complexity, SADs are computed using images divided into non-overlapping regions, and the matching result is iteratively refined by reducing a window size. Window-parallel-and-pixel-parallel architecture is also proposed to achieve to fully exploit the potential parallelism of the algorithm. The architecture also reduces the complexity of an interconnection network between memory and functional units based on the regularity of reference pixels. The stereo matching processor is implemented on an FPGA. Its performance is 80 times higher than that of a microprocessor (Pentium4@2 GHz), and is enough to generate a 3-D depth image at the video rate of 33 MHz.

  • Evaluation of a Field-Programmable VLSI Based on an Asynchronous Bit-Serial Architecture

    Masanori HARIYAMA  Shota ISHIHARA  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E91-C No:9
      Page(s):
    1419-1426

    This paper presents a novel asynchronous architecture of Field-programmable gate arrays (FPGAs) to reduce the power consumption. In the dynamic power consumption of the conventional FPGAs, the power consumed by the switch blocks and clock distribution is dominant since FPGAs have complex switch blocks and the large number of registers for high programmability. To reduce the power consumption of switch blocks and clock distribution, asynchronous bit-serial architecture is proposed. To ensure the correct operation independent of data-path lengths, we use the level-encoded dual-rail encoding and propose its area-efficient implementation. The proposed field-programmable VLSI is implemented in a 90 nm CMOS technology. The delay and the power consumption of the proposed FPVLSI are respectively 61% and 58% of those of 4-phase dual-rail encoding which is the most common encoding in delay insensitive encoding.

  • Architecture of an Asynchronous FPGA for Handshake-Component-Based Design

    Yoshiya KOMATSU  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Architecture

      Vol:
    E96-D No:8
      Page(s):
    1632-1644

    This paper presents a novel architecture of an asynchronous FPGA for handshake-component-based design. The handshake-component-based design is suitable for large-scale, complex asynchronous circuit because of its understandability. This paper proposes an area-efficient architecture of an FPGA that is suitable for handshake-component-based asynchronous circuit. Moreover, the Four-Phase Dual-Rail encoding is employed to construct circuits robust to delay variation because the data paths are programmable in FPGA. The FPGA based on the proposed architecture is implemented in a 65 nm process. Its evaluation results show that the proposed FPGA can implement handshake components efficiently.

  • Minimizing Energy Consumption Based on Dual-Supply-Voltage Assignment and Interconnection Simplification

    Masanori HARIYAMA  Shigeo YAMADERA  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E89-C No:11
      Page(s):
    1551-1558

    This paper presents a design method to minimize energy of both functional units (FUs) and an interconnection network between FUs. To reduce complexity of the interconnection network, data transfers between FUs are classified according to FU types of operations in a data flow graph. The basic idea behind reducing the complexity of the interconnection network is that the interconnection resource can be shared among data transfers with the same FU type of a source node and the same FU type of a destination node. Moreover, an efficient method based on a genetic algorithm is presented.

  • Dynamic-Storage-Based Logic-in-Memory Circuit and Its Application to a Fine-Grain Pipelined System

    Hiromitsu KIMURA  Takahiro HANYU  Michitaka KAMEYAMA  

     
    PAPER-Low-Power Technologies

      Vol:
    E85-C No:2
      Page(s):
    288-296

    A new logic-in-memory circuit is proposed for a fine-grain pipelined VLSI system. Dynamic-storage elements are distributed over a logic-circuit plane. A functional pass gate is a key component, where a linear summation and threshold function are merged compactly using charge-storage and charge-coupling effect with a DRAM-cell-based circuit structure. The use of dynamic logic based on pass-transistor network using functional pass gates makes it possible to realize any logic circuits compactly with small power dissipation. As a typical example, a 54-bit pipelined multiplier is implemented by using the proposed circuit technology. Its power dissipation and chip area are reduced to about 63 percent and 72 percent, respectively, in comparison with those of a corresponding binary CMOS implementation under 0.35-µm CMOS technology.

  • Implementation of a DRAM-Cell-Based Multiple-Valued Logic-in-Memory Circuit

    Hiromitsu KIMURA  Takahiro HANYU  Michitaka KAMEYAMA  

     
    PAPER-Optoelectronics

      Vol:
    E85-C No:10
      Page(s):
    1814-1823

    This paper presents a multiple-valued logic-in-memory circuit with real-time programmability. The basic component, in which a dynamic storage function and a multiple-valued threshold function are merged, is implemented compactly by using charge storage and capacitive coupling with a DRAM-cell-based circuit structure under a 0.8-µm CMOS technology. The pass-transistor network using these basic components makes it possible to realize any multiple-valued-inputs binary-outputs logic circuits compactly. As a typical example, a fully parallel multiple-valued magnitude comparator is also implemented by using the proposed DRAM-cell-based pass-transistor network. Its execution time and power dissipation are reduced to about 11 percent and 29 percent, respectively, in comparison with those of a corresponding binary implementation. A prototype chip is also fabricated to confirm the basic operation of the proposed DRAM-cell-based logic-in-memory circuit.

  • FOREWORD

    Michitaka KAMEYAMA  

     
    FOREWORD

      Vol:
    E77-C No:7
      Page(s):
    1021-1022
  • Multiple-Valued Code Assignment Algorithm for VLSI-Oriented Highly Parallel k-Ary Operation Circuits

    Saneaki TAMAKI  Michitaka KAMEYAMA  

     
    PAPER-Multiple-Valued Architectures and Systems

      Vol:
    E76-C No:7
      Page(s):
    1112-1118

    Design of high-speed digital circuits such as adders and multipliers is one of the most important issues to implement high performance VLSI systems. This paper proposes a new multiple-valued code assignment algorithm to implement locally computable combinational circuits for k-ary operations. By the decomposition of a given k-ary operation into unary operations, a code assignment algorithm for k-ary operations is developed. Partition theory usually used in the design of sequential circuits is effectively employed for optimal code assignment. Some examples are shown to demonstrate the usefulness of the proposed algorithm.

  • Design of Highly Parallel Linear Digital System for ULSI Processors

    Masami NAKAJIMA  Michitaka KAMEYAMA  

     
    PAPER-Multiple-Valued Architectures and Systems

      Vol:
    E76-C No:7
      Page(s):
    1119-1125

    To realize next-generation high performance ULSI processors, it is a very important issue to reduce the critical delay path which is determined by a cascade chain of basic gates. To design highly parallel digital operation circuits such as an adder and a multiplier, it is difficult to find the optimal code assignment in the non-linear digital system. On the other hand, the use of the linear concept in the digital system seems to be very attractive because analytical methods can be utilized. To meet the requirement, we propose a new design method of highly parallel linear digital circuits for unary operations using the concept of a cycle and a tree. In the linear digital circuit design, the analytical method can be developed using a representation matrix, so that the search procedure for optimal locally computable circuits becomes very simple. The evaluations demonstrate the usefulness of the circuit design algorithm.

  • Multiple-Valued Programmable Logic Array Based on a Resonant-Tunneling Diode Model

    Takahiro HANYU  Yoshikazu YABE  Michitaka KAMEYAMA  

     
    PAPER-Multiple-Valued Architectures and Systems

      Vol:
    E76-C No:7
      Page(s):
    1126-1132

    Toward the age of ultra-high-density digital ULSI systems, the development of new integrated circuits suitable for an ultimately fine geometry feature size will be an important issue. Resonant-tunneling (RT) diodes and transistors based on quantum effects in deep submicron geometry are such kinds of key devices in the next-generation ULSI systems. From this point of view, there has been considerable interests in RT diodes and transistors as functional devices for circuit applications. Especially, it has been recognized that RT functional devices with multiple peaks in the current-voltage (I-V) characteristic are inherently suitable for implementing multiple-valued circuits such as a multiple-state memory cell. However, very few types of the other multiple-valued logic circuits have been reported so far using RT devices. In this paper, a new multiple-valued programmable logic array (MVPLA) based on RT devices is proposed for the next-generation ULSI-oriented hardware implementation. The proposed MVPLA consists of 3 basic building blocks: a universal literal circuit, an AND circuit and a linear summation circuit. The universal literal circuit can be directly designed by the combination of the RT diodes with one peak in the I-V characteristic, which is programmable by adjusting the width of quantum well in each RT device. The other basic building blocks can be also designed easily using the wired logic or current-mode wired summation. As a result, a highdensity RT-diode-based MVPLA superior to the corresponding binary implementation can be realized. The device-model-based design method proposed in this paper is discussed using static characteristics of typical RT diode models.

  • Task Allocation with Algorithm Transformation for Reducing Data-Transfer Bottlenecks in Heterogeneous Multi-Core Processors: A Case Study of HOG Descriptor Computation

    Hasitha Muthumala WAIDYASOORIYA  Daisuke OKUMURA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E93-A No:12
      Page(s):
    2570-2580

    Heterogeneous multi-core processors are attracted by the media processing applications due to their capability of drawing strengths of different cores to improve the overall performance. However, the data transfer bottlenecks and limitations in the task allocation due to the accelerator-incompatible operations prevents us from gaining full potential of the heterogeneous multi-core processors. This paper presents a task allocation method based on algorithm transformation to increase the freedom of task allocation. We use approximation methods such as CORDIC algorithms to map the accelerator-incompatible operations to accelerator cores. According to the experimental results using HOG descriptor computation, the proposed task allocation method reduces the data transfer time by more than 82% and the total processing time by more than 79% compared to the conventional task allocation method.

  • Low-Power Field-Programmable VLSI Using Multiple Supply Voltages

    Weisheng CHONG  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Low Power Methodology

      Vol:
    E88-A No:12
      Page(s):
    3298-3305

    A low-power field-programmable VLSI (FPVLSI) is presented to overcome the problem of large power consumption in field-programmable gate arrays (FPGAs). To reduce power consumption in routing networks, the FPVLSI consists of cells that are based on a bit-serial pipeline architecture which reduces routing block complexity. Moreover, a level-converter-less multiple-supply-voltage scheme using dynamic circuits is proposed, where the cells in non-critical paths use a low supply voltage for low power under a speed constraint. The FPVLSI is evaluated based on a 0.18-µm CMOS design rule. The power consumption of the FPVLSI using multiple supply voltages is reduced to 17% or less compared to that of the static-circuit-based FPVLSI using multiple supply voltages.

  • Design of Robust-Fault-Tolerant Multiple-Valued Arithmetic Circuits and Their Evaluation

    Takeshi KASUGA  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER

      Vol:
    E76-C No:3
      Page(s):
    428-435

    Robust-fault tolerance is a property that a computational result becomes nearly equal to the correct one at the occurrence of faults in digital system. There are many cases where the safety of digital control systems can be maintained if the property is satisfied. In this paper, robust-fault-tolerant three-valued arithmetic modules such as an adder and a multiplier are proposed. The positive and negative integers are represented by the number of 1's and 1's, respectively. The design concept of the arithmetic modules is that a fault makes linearly additive effect with a small value to the final result. Each arithmetic module consists of identical submodules linearly connected, so that multi-stage structure is formed to generate the final output from the last submodule. Between the input and output digits in the submodule some simple functional relation is satisfied with respect to the number of 1's and 1's. Moreover, the output digit value depends on very small portion of the submodules including the input digits. These properties make the linearly additive effect with a small value to the final result in the arithmetic modules even if multiple faults are occurred at the input and output of any gates in the submodules. Not only direct three-valued representation but also the use of three-valued logic circuits is inherently suitable for efficient implementation of the arithmetic VLSI system. The evaluation of the robust-fault-tolerant three-valued arithmetic modules is done with regard to the chip size and the speed using the standard CMOS design rule. As a result, it is made clear that the chip size can be greatly reduced.

  • Prospects of Multiple-Valued VLSI Processors

    Takahiro HANYU  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    INVITED PAPER

      Vol:
    E76-C No:3
      Page(s):
    383-392

    Rapid advances in integrated circuit technology based on binary logic have made possible the fabrication of digital circuits or digital VLSI systems with not only a very large number of devices on a single chip or wafer, but also high-speed processing capability. However, the advance of processing speeds and improvement in cost/performance ratio based on conventional binary logic will not always continue unabated in submicron geometry. Submicron integrated circuits can handle multiple-valued signals at high speed rather than binary signals, especially at data communication level because of the reduced interconnections. The use of nonbinary logic or discrete-analog signal processing will not be out of the question if the multiple-valued hardware algorithms are developed for fast parallel operations. Moreover, in VLSI or ULSI processors the delay time due to global communications between functional modules or chips instead of each functional module itself is the most important factors to determine the total performance. Locally computable hardware implementation and new parallel hardware algorithms natural to multiple-valued data representation and circuit technologies are the key properties to develop VLSI processors in submicron geometry. As a result, multiple-valued VLSI processors make it possible to improve the effective chip density together with the processing speed significantly. In this paper, we summarize several potential advantages of multiple-valued VLSI processors in submicron geometry due to great reduction of interconnection and due to the suitability to locally computable hardware implementation, and demonstrate that some examples of special-purpose multiple-valued VLSI processors, which are a signed-digit arithmetic VLSI processor, a residue arithmetic VLSI processor and a matching VLSI processor can achieve higher performance for real-world computing system.

  • Multiple-Valued VLSI Image Processor Based on Residue Arithmetic and Its Evaluation

    Makoto HONDA  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER

      Vol:
    E76-C No:3
      Page(s):
    455-462

    The demand for high-speed image processing is obvious in many real-world computations such as robot vision. Not only high throughput but also small latency becomes an important factor of the performance, because of the requirement of frequent visual feedback. In this paper, a high-performance VLSI image processor based on the multiple-valued residue arithmetic circuit is proposed for such applications. Parallelism is hierarchically used to realize the high-performance VLSI image processor. First, spatially parallel architecture that is different from pipeline architecture is considered to reduce the latency. Secondly, residue number arithmetic is introduced. In the residue number arithmetic, data communication between the mod mi arithmetic units is not necessary, so that multiple mod mi arithmetic units can be completely separated to different chips. Therefore, a number of mod mi multiply adders can be implemented on a single VLSI chip based on the modulus-slice concept. Finally, each mod mi arithmetic unit can be effectively implemented in parallel structure using the concept of a pseudoprimitive root and the multiple-valued current-mode circuit technology. Thus, it is made clear that the throughout use of parallelism makes the latency 1/3 in comparison with the ordinary binary implementation.

  • Architecture of a Parallel Multiple-Valued Arithmetic VLSI Processor Using Adder-Based Processing Elements

    Katsuhiko SHIMABUKURO  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E76-C No:3
      Page(s):
    463-471

    An adder-based arithmetic VLSI processor using the SD number system is proposed for the applications of real-time computation such as intelligent robot system. Especially in the intelligent robot control system, not only high throughput but also small latency is a very important subject to make quick response for the sensor feedback situation, because the next input sample is obtained only after the robot actually moves. It is essential in the VLSI architecture for the intelligent robot system to make the latency as small as possible. The use of parallelism is an effective approach to reduce the latency. To meet the requirement, an architecture of a new multiple-valued arithmetic VLSI processor is developed. In the processor, addition and subtraction are performed by using the single adderbased processing element (PE). More complex basic arithmetic operations such as multiplication and division are performed by the appropriate data communications between the adder-based PEs with preserving their parallelism. In the proposed architecture, fine-grain parallel processing at the adder-based PE level is realized, and all the PEs can be fully utilized for any parallel arithmetic operations according to adder-based data dependency graph. As a result, the processing speed will be greatly increased in comparison with the conventional parallel processors having the different kinds of the arithmetic PEs such as an adder, a multiplier and a divider. To realize the arithmetic VLSI processor using the adder-based PEs, we introduce the signed-digit (SD) number system for the parallel arithmetic operations because the SD arithmetic has the advantage of modularity as well as parallelism. The multiple-valued bidirectional currentmode technology is also used for the implementation of the compact and high-speed adder-based PE, and the reduction of the number of the interconnections. It is demonstrated that these advantges of the multiple-valued technology are fully used for the implementation of the arithmetic VLSI processor. As a result, the latency of the proposed multiple-valued processor is reduced to 25% that of the binary processor integrated in the same chip size.

  • Evaluation of an FPGA-Based Heterogeneous Multicore Platform with SIMD/MIMD Custom Accelerators

    Yasuhiro TAKEI  Hasitha Muthumala WAIDYASOORIYA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2576-2586

    Heterogeneous multi-core architectures with CPUs and accelerators attract many attentions since they can achieve power-efficient computing in various areas from low-power embedded processing to high-performance computing. Since the optimal architecture is different from application to application, finding the most suitable accelerator is very important. In this paper, we propose an FPGA-based heterogeneous multi-core platform with custom accelerators for power-efficient computing. Using the proposed platform, we evaluate several applications and accelerators to identify many key requirements of the applications and properties of the accelerators. Such an evaluation is very important to select and optimize the most suitable accelerator according to the requirements of an application to achieve the best performance.

  • Design of a Multiple-Valued VLSI Processor for Digital Control

    Katsuhiko SHIMABUKURO  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER-Computer Hardware and Design

      Vol:
    E75-D No:5
      Page(s):
    709-717

    It is well known that the multiple-valued signed-digit (SD) arithmetic circuits have the attractive features of compactness and high-speed operation. However, both of these features have yet to be utilized fully. In this paper, we consider the application of a parallel-structure-based VLSI processor. A high-performance parallel-structure-based multiple-valued VLSI processor using the radix-2 SD number system is proposed. Its compactness makes the parallelism high under chip size limitations in comparison with the ordinary binary arithmetic circuits. Moreover, the speed of the single arithmetic module is very high in the SD arithmetic circuits, so that we can take advantage of the high-speed operation in the parallel-structure-based VLSI processor chip. The multiple-valued bidirectional current-mode technology is used not only in high-speed small sized arithmetic circuits, but also in reducing the number of connections in the parallel-structure-based VLSI processor. The proposed processor is specially developed for real-time digital control, where the performance is evaluated by delay time. Performance estimation using SPICE simulators shows that the delay time of proposed processor for matrix operations such as matrix multiplication is greatly reduced in comparison with a conventional binary processor.

21-40hit(65hit)